EN FR
EN FR


Project Team Alpage


Contracts and Grants with Industry
Bibliography


Project Team Alpage


Contracts and Grants with Industry
Bibliography


Section: Scientific Foundations

Coreference resolution

Participants : Pascal Denis, Philippe Muller, Laurence Danlos.

An important challenge for the understanding of natural language texts is the correct computation of the discourse entities that are mentioned therein —persons, locations, abstract objects, and so on. In addition to identifying individual referential expressions (e.g., Nicolas Sarkozy, Neuilly, l'UMP) and properly typing them (e.g. Nicolas Sarkozy is a person , Neuilly is a lieu ), the task is also to determine the other mentions with which these expressions are coreferential. Part of the difficulty of this task is that natural languages provide many ways to refer to the same entity (including the use of pronouns such as il, ses and definite descriptions such as le président, making them highly ambiguous. The identification of coreferential links and other anaphoric links (such as “associative anaphora”) plays a key role for various applications, such as extraction and retrieval of information, but also the summary or automatic question-answering systems. This central role of coreference resolution has been recognized by the inclusion of this task in different international evaluation campaigns, beginning with the campaigns Message Understanding Conference (in particular, muc-6 and muc-7 )(See, respectively: http://www.cs.nyu.edu/cs/faculty/grishman/muc6.html and http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/muc_7_toc.html .), and more recently Automatic Content Extraction (ace )(http://www.nist.gov/speech/tests/ace/ ) and Anaphora Resolution Evaluation (are )(http://clg.wlv.ac.uk/events/ARE/ ). The creation and distribution of corpora developped as part of these campaigns have significantly boosted research in automatic coreference resolution. In particular, they have made possible the application of machine learning techniques (mostly supervised ones) to the problem of coreference resolution. This in turn has led to the development of systems that were both more robust and more precise, thus making more realistic their integration within these larger systems. Some of the best systems based on supervised learning methods are described in [123] , [99] , [95] , [100] , [94] , [88] . Note that a few attemtps were also made at using unsupervised techniques (mostly clustering methods) for the task [74] , [101] , but these systems are still far from reaching the performance of their supervised counterparts.